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HYPERSONIC VEHICLE TRAJECTORY OPTIMIZATION AND CONTROL 


EXECUTIVE SUMMARY 

Two classes of neural networks have been developed for the study of hypersonic vehicle 
trajectory optimization and control. 

The first one is called an ‘adaptive critic’. The uniqueness and main features of this approach 
are that: 1) they need no external training, 2) they allow variability of initial conditions, and 3) they 
can serve as feedback control. This is used to solve a ‘free final time’ two-point boundary value 
problem that maximizes the mass at the rocket burn-out while satisfying the pre-specified burn-out 
conditions in velocity, flightpath angle, and altitude. 

The second neural network is a recurrent network. An interesting feature of this network 
formulation is that when its inputs are the coefficients of the dynamics and control matrices, the 
network outputs are the Kalman sequences (with a quadratic cost function); the same network is also 
used for identifying the coefficients of the dynamics and control matrices. Consequently, we can use 
it to control a system whose parameters are uncertain. 

Numerical results are presented which illustrate the potential of these methods. 
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I. BACKGROUND 


For the United States to maintain its leadership in space technology, cheaper means of space 
transportation - alternatives to space shuttle must be developed. In order to develop such an 
alternative, different configurations of hypersonic vehicles must be studied from the perspectives of 
cost-effective performance. A major part of such study involves optimal trajectory design for its 
mission and control of vehicles. Since current state of knowledge of hypersonic vehicles (in 
atmospheric flight, especially) is limited, it is imperative that any tool that is developed for trajectory 
optimization and control be usable with variations in flight parameters. There are quite a few 
methods - direct and indirect - available in the existing literature which deal with trajectory 
optimization and optimal control. However, they are either ill-suited for design or do not consider 
the design phase of a vehicle. First, for each scenario, typically, a two-point boundary value problem 
needs to be solved. This process could lead to an enormous amount of time when several 
combinations of scenarios are considered. Second, many trajectory optimization methods do not 
directly yield a feedback form of control that can be used in flight. 

In this study, two new neural network based approaches have been formulated which address 
the two problems mentioned. The resulting design technique enables the user to study optimal 
trajectory of hypersonic vehicles with a set of predetermined neural networks. For an envelope of 
scenarios, this approach is expected to yield near optimal trajectories. We formulate the 
problem in such a way as to produce a feedback control directly. In the case of recurrent networks, 
the gains of the matrices used in a linearized control. 
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n. WHY NEURAL NETWORKS 

Use of direct or indirect methods of optimization necessitates having to solve a problem for 
each set of initial conditions. This requires determining a separate solution for each possible initial 
condition for a given system. Dynamic programming is also a method of determining optimal control 
for a family of initial conditions. However, the usual method of solution becomes very difficult to 
solve in higher dimensions and nonlinear systems. These methods of solution for control do not 
usually yield a feedback form of control in terms of states either. 

Other methods of solution also have their advantages and disadvantages. Neighboring 
optimal control is beneficial in that the solution of a single two-point boundary value problem 
(TPBVP) allows an approximate solution over a limited range of initial conditions. The disadvantage 
is that approximation methods such as neighboring optimal control can fail at a distance from the 
original TPBVP solution. 

Currently, there is no unified mathematical formalism under which a controller can be 
designed for nonlinear systems. Techniques like feedback linearization have been used for a few 
nonlinear problems under limited conditions, such as equal number of inputs and outputs. More 
rigorous and general solutions are available with linearized models; however, they are restricted by 
the assumption of linear models. Other available solutions for nonlinear controllers are highly 
problem oriented. Consequently, we propose a formulation with neural networks which. 1) solves 
a nonlinear control problem directly without any approximation to the system model (in the absence 
of a good model this approach can synthesize a nonlinear model of the states), 2) yield a control law 
in a feedback form as a function of the current states, and 3) maintain the same structure regardless 
of the type or problem (handles linear problems as well). Such a formulation is afforded by the field 
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of neural networks. In the following sections, we trace the development of neural networks and 
development of learning control in particular. 

m. LITERATURE REVIEW 

The development of intelligent control system design techniques has a long and rich history 
as does the field of control systems engineering in general. Neural network techniques have also been 
used in control systems for quite a long time but recently have become very popular. This section 
contains a brief survey of the history of control ranging from cybernetics in the 1940 s through 
learning control systems and the beginning of neural control in the 1960 s. The next important 
landmark occurred with the use of critic architectures in reinforcement learning systems. We 
conclude the section with a brief survey of current literature in neural control organized in the areas 
of system identification, nonlinear, adaptive, and optimal neural control. 

1. Cybernetics Neural Networks and Learning Control. Norbert Wiener is recognized 
as the father of cybernetics, a field which he describes as “the control and communication in the 
animal and in the machine” [1]. Cybernetics also provided some of the motivation for the 
development of control theory and neural networks during the 1950's and 1960's. For example, 
Ashby contributed two complementary monographs in cybernetics. Design for a Brain [2] and An 
Introduction to Cybernetics [3] which discussed control and communication in biological systems. 
In the former, Ashby gave an early implementation of an artificial neural network called the hemostat. 
The latter contribution was a careful development of cybernetics intended to popularize the 
technology. Topics discussed include feedback, stability, a black box theory for large systems, 
regulation and control in biological systems, and hierarchical control. 
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K. S. Fu gives one of the first formal descriptions of learning control in [4]. A learning 
control system is a control system capable of modifying its behavior based on experience in order to 
maintain acceptable performance in the presence of uncertainties. Possible measures of performance 
include the amount of time required to adapt to changes and the evaluation of suitable performance 
indices. A learning control system is distinguished from adaptive control systems through its ability 
to recognize familiar patterns in a situation and, based on past experience, to adjust in order to 
improve performance. Adaptive control systems emphasize a control system s ability to react to new 
situations. 

Sklansky gives an early survey of learning control [5], According to Sklansky, learning in the 
automatic control literature is associated with a hierarchical arrangement of three feedback loops. 
These are the controller, a system identifier or pattern recognizer, and a teacher. The pattern 
recognizer transforms observable quantities in the system into a fixed set of categories, each of which 
corresponds to a set of controller parameters. Categories are represented by fixed regions in an 
intermediate feature space. The teacher provides information to the pattern recognizer for adjusting 
the boundaries between categories in the feature space so that improved control system performance 
results. An adaptive control system uses only the first two loops. The learning loop, which 
distinguishes a learning control system from an adaptive control system, sends reinforcement signals 
in the form of a reward or a punishment to the pattern recognizer based on an assessment of current 
control system performance. 

The advantage of the use of the learning loop is that it provides a means of training the pattern 
recognizer on-line. Sklansky describes five techniques for the design of learning control systems and 
notes their interrelationships and pattern classification. These techniques are decision theory. 
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trainable threshold logic, hill climbing, sample set construction, and Markov chains. In the decision 
theoretic approach, the boundaries between classes are determined by estimating joint probability 
densities using measurements taken from the system during operation. The trainable threshold logic 
method which Sklansky describes is actually a precursor to the use of neural networks for control. 
In this method, category boundaries are moved by adjustment of weighted sums of components in 
a feature vector, this weighted sum is then passed through a threshold function to produce a bipolar 
control signal. The teacher in a threshold logic learning system provides information for adjusting 
weights in the categorizer. The sample set construction technique breaks categories into 
subcategories based on distances measured in the feature space. During training a fixed set of 
prototype feature vectors are developed with the subcategories given by open balls surrounding the 
prototype. We then form the category regions as the unions of subcategories. The boundary between 
categories is formed as a sequence of hyperplanes perpendicular to hyperplanes joining prototypes 
from each category. 

Ideas from decision theory, trainable threshold logic, and sample set construction are 
prominent in the development of neural network theory. In 1966 Nikolic and Fu [6] describe an 
algorithm based on decision theory for on-line learning control of an unknown discrete time plant 
without an external teacher. Control actions are chosen from a finite set. The performance index is 
the conditional expectation of the instantaneous performance evaluations with respect to observed 
states and allowable control actions. The model used by Nikolic and Fu is very similar to Sklansky’s 
general learning control system and they include provisions for the case when the teacher does not 
have perfect knowledge of the plant being controlled. This work provides the foundation for later 
critic based schemes. 
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Tsypkin also makes contributions in learning control systems based on decision theory and 
optimization. In an article about ‘self-1 earning’ [7], Tsypkin distinguishes between three methods for 
determining decision rules in the pattern recognizer. The first method assumes that statistical 
information is available in advance. In this case statistical decision theory can be used to determine 
the decision rule. In the second method, the designer assumes that a sequence of correctly classified 
patterns exists. In this case, the decision rule is determined based on data in the training set and the 
method is called learning with reinforcement. In the third case, no information is assumed initially 
and the decision rule is found using observed but unclassified patterns from the system. Tsypkin calls 
this third case self-learning. Extensions of the idea of self-learning in automatic systems applied to 
pattern recognition, identification, dual control, and the allocation of resources are discussed in a later 
work [8] and compiled into a text [9]. 

The improvement in performance with respect to given performance objectives and based on 
experience is a common theme in learning control. There are three components related to 
performance in the control system: 1) the specification optimal performance objectives, 2) the 
assessment of the system’s level of performance, and 3) a means for improving performance over 
time. Cybernetics and learning control are based on the use of pattern recognition, optimization, and 
control of uncertain dynamic systems using biologically inspired models of intelligent behavior. 
Rudimentary neural networks in the form of linear threshold logic units have been used as an 
implementation medium for learning control systems cited above. We now turn to a discussion of 
a subclass of learning control systems called reinforcement learning systems which build in methods 
for assessing and improving control system performance. 
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2. Learning with a Critic. The ground breaking work on learning control in the 1960’s, 
along with studies in cybernetics, has led to a study of critic-based systems for two decades and this 
study has recently been revived even in the current decade. In 1970, Mendel and McLaren introduced 
a concept in learning control which they call reinforcement learning [10]. Reinforcement learning 
control is developed as a subclass of learning control discussed above with the addition of 
performance assessment and a method for modifying controller actions. The idea is to provide a 
means of control for unstructured environments where the plant model may not be known or where 
a complex performance measure is used [11]. In reinforcement learning systems, a critic is used to 
monitor plant inputs and outputs and to provide an evaluation signal which represents an indication 
of current performance to the controller. 

Widrow, Gupta and Maitra [12] describe the concept of the critic for adaptation of neural 
networks. Widrow et al., delineate three separate modes of learning. A supervised learning system, 
also known as learning with a teacher, modifies the parameters of the neural network using error 
between network output signals and the desired output signals. The assumption here is that the 
desired output signals corresponding to each input signal are known at the time that learning is taking 
place. In an unsupervised learning procedure, also called learning without a teacher or decision- 
directed learning, the parameter adjustments are not guided by knowledge of a desired output signal. 
L ea rning with a critic bridges the gap between the two previous methods. Learning with a critic does 
not assume that desired output signals are known for each input signal but rather that some indication 
can be made with respect to network performance over a series of trials. 

Barto, Sutton and Anderson [13] extend the idea of learning with a critic through the 
development of a learning system which includes both an adaptive critic element and an adaptive 
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search element. As in the learning with a critic approach, explicit desired control actions are 
unknown. The objective is to provide control signals which tend to optimize a performance index. 
The purpose of the adaptive search element is to implement a trial-and-error procedure to associate 
control vectors with respective observations of the state of the system being controlled. The adaptive 
critic element receives a success/failure signal from an outside source as a result of a series of control 
actions. This signal is called an external reinforcement signal. The adaptive critic element also 
receives weighted signals from each of the state variable of the controlled system. The external 
reinforcement signal provides feedback for modifying the strengths of these connections. The 
adaptive critic element uses the external reinforcement signal and weighted state signals to provide 
a continuous evaluation of performance to help guide the search for appropriate control actions. 
Sutton calls the critic based adaptation algorithm the “Adaptive Heuristic Critic” and develops its 
application in credit assignment problems in his Ph.D. dissertation [14], 

The implementation of the adaptive critic is based on Widrow’s method for learning with a 
critic but provides a higher level of feedback to the control system. Two sets of connection weights 
connecting two processing elements are adjusted during the learning procedure. This, in conjunction, 
with the active search distinguishes the adaptive critic architecture from previous work. The adaptive 
critic architecture is capable of learning to balance a pole mounted on a movable cart by applying 
control signals to a movable cart with no prior knowledge of the system to be controlled. This ability 
to determine control actions assuming no previous knowledge is a great strength of the adaptive critic 
architecture. The disadvantage of the architecture is that many failed trials occur before a successful 
run is completed. The cart-pole solution also depends on the partitioning of the problem state space 
into a finite number of regions. This partitioning may not be practical in problems where finer control 
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is required. In this case the number of regions required may be too large for effective results. 
Examples of such problems include those with time-varying dynamics, tracking problems, and some 
nonlinear problems. 

Barto et al. [13], distinguish between supervised learning paradigms and reinforcement 
learning used in the adaptive critic approach. In the supervised learning approach training proceeds 
in several steps. First an input pattern is presented to a neural network. An output response is 
produced based on the current parameters embedded within the network. The response is then 
compared with a desired response and error is used to modify the neural network parameters to 
improve its mapping. Reinforcement learning is based on an evaluation of the current network output 
in relationship with current external factors (states in a system for example). This evaluation may be 
as simple as a binary decision indicating a reward for proper response or punishment for inappropriate 
response. The quality of feedback for a system using reinforcement learning is lower than that 
available in a supervised learning system. This property makes reinforcement learning methods useful 
for situations when a quantitative answer is not available. 

Werbos [15] defends the use of neural networks for control applications. He suggests that 
neural networks will be able to solve difficult problems faced by modern controls engineers including 
the real-time control of nonlinear possibly unknown systems with high noise levels and high 
throughput. Werbos describes five dominant paradigms for use in neural control systems. These are 
Supervised Control, Inverse Dynamics, Stabilization Systems, Backpropagation Through Time and 
Adaptive Critics with Reinforcement Learning. The Supervised Control architecture uses a neural 
network trained to map current state vectors to corresponding control vectors. In the inverse 
dynamics approach, observed system state is assumed to be a function of the current control and 
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previous system state. The neural network is trained to invert the plant in order to provide control 
actions which lead to desired states. Stabilization systems are designed to provide stable control in 
tracking and regulator problems. Backpropagation through time depends on a plant model and a 
performance index written in terms of control and state actions. The neural network predicts a 
sequence of states given a sequence of control actions. The backpropagation algorithm then provides 
derivatives of the performance index which can be used to update control actions at each step along 
the way. Adaptive critic architectures and reinforcement learning are the focal point in [33], Werbos 
describes systems based on the adaptive critic as an approximation to dynamic programming and 

presents the notion of the backpropagated critic. 

Jameson [16] claims to be the first to publish results using a backpropagated critic. The 
primary difference between the adaptive critic architecture of Barto Sutton, and Anderson and the 
backpropagated critic is the maximization of the critic output providing gradient information via a 
plant model network to the controller so that future control actions can be improved. The purpose 
of the critic network in this architecture is to predict future reinforcement signals from the 
environment. The critic network and a model of the plant are used to calculate derivatives of the 
predicted reinforcement signal with respect to control actions. The control actions are then modified 
to improve performance. The prediction provided by the critic network is also improved by 
comparing the actual reinforcement signal with previously stored predictions. The backpropagated 
critic, like previous critic designs, assumes no knowledge of the plant and results are improved by 
making multiple attempts at a solution. 

Sofge and White [17] advocate the development of neural control architectures which can be 
adapted on-line for stable operation of unknown, nonlinear plants which may include noise in the 
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feedback loop. They suggest that adaptive critic architectures may be used in manufacturing the 
process control applications to provide flexibility and efficient adaptability through changes which 
occur during the life-cycle of equipment. The authors use an adaptive critic architecture based on 
Albus’ CMAC neural network [18] to do process control in a thermoplastic composite manufacturing 
process. According to Sofge and White, “the goal of on-line learning is the real-time optimization 
of a large scale non-linear process at minimal computational cost.” The authors have designed and 
built an adaptive critic system for control of manufacturing processes. 

Watkins gives a recent implementation of reinforcement learning called Q-learning in his 
dissertation [19], Q-learning is based on the approximation of a real valued function, called the Q- 
function by Watkins. The q-fiinction is a function that maps current plant state and control into an 
estimation of the future performance of the system. This estimate is based on the assumption that 
optimal control is applied to the plant from the next time instant forward. A Q-learning algorithm 
is an algorithm which iteratively improves the estimation for the Q-function. There is a close 
correspondence between Q-leaming and dynamic programming used in the control of dynamical 
systems [20]. Bradtke [21] distinguishes between two types of Q-learning algorithms. Bradtke calls 
the form described above an optimizing Q-learning algorithm because it tries to learn the Q-fiinction 
directly. A slightly modified form called the policy-based Q-learning algorithm tries to learn an 
optimal sequence of plant control inputs (the control policy). 

Many recent control system applications of the ideas of reinforcement learning and adaptive 
critic architectures exist. Gullapalli describes a reinforcement learning algorithm for learning control. 
This method uses radial basis functions and the adjustable parameters of the network are means and 
variances of normal distribution functions. The method is applied to a simulated 3 degree-of-freedom 
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robotic arm [22], Stamenkovich uses adaptive critic and adaptive search elements for learning to 
guide a ship through a channel [23], Shelton [24] demonstrates an adaptive critic design for 
controlling a truck with a CMAC (Cerebellar Model Articulated Controller, [18). Tham and Prager 
compare the adaptive heuristic critic algorithm with the Q-Learning algorithm for obstacle avoidance 
and control in multi-linked robotic manipulators [25], Gachet et al., present an adaptive heuristic 
critic based control system for learning goal based behavior for autonomous robot control. The three 
types of behavior discussed are: 1) move to a goal state, 2) do surveillance, and 3) follow a specified 
path [26], 

3. Neural Identification and Control . There has been an explosion of reported research 
in the use of neural networks in control systems in recent years. Bavarian [27] gives an introduction 
to the use of neural networks for intelligent control. Several monographs have been compiled 
including a well known work edited by Miller, Sutton, and Werbos [28], White and Sofge have 
compiled a book which includes several chapters dealing with the use of neural networks in intelligent 
control systems [29], Hunt et ah, have produced a comprehensive survey of the field [30], 

Psaltis, Sideris and Yamamura describe three possible architectures for neural control systems 
[31], The indirect learning architecture attempts to invert the plant in order to provide control signals 
which track a given input signal. In the generalized learning architecture the desired plant input signal 
is assumed known and the neural network is trained to produce input signals for the next sampling 
interval given the current plant output. The result is an output feedback control. The third 
architecture is called the specialized learning architecture where the neural network is trained to 
provide control to track an input function by minimizing the tracking error. 
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Levin and Narendra [32] present a theory for the design of neural control systems which 
stabilize nonlinear dynamic systems about an equilibrium point. This theory is based on nonlinear 
control theory. The article contains necessary background information in nonlinear control theory 
and many examples illustrating the interaction between nonlinear theory and the use of neural 
networks for stable regulation. Possible control methods for nonlinear systems include: 1) the use 
of a linear controller which assumes that the plant can be linearized about the operating point, 2) 
stabilizing control using feedback stabilization where a change in state variables and a feedback 
control law are used to transform a system into one which is linear about an operating point, and 3) 
direct stabilization through the use of a nonlinear control law. Neural control designs are given for 
the feedback stabilization and direct stabilization methods. 

As stated above, adaptive and learning control systems depend on the ability to identify plant 
dynamics. There have been a number of contributions in the use of neural networks for system 
identification. Narendra and Parthasarathy discuss feedforward and recurrent neural network 
structures for identification and control of systems [33]. The authors present a method for training 
recurrent neural networks and describe necessary assumptions for well posed neural control problems. 
Fernandez, Parlos, and Tsai investigate nonlinear system identification with neural networks by using 
a recurrent network to identify nonlinear dynamic systems in discrete time based on input-output 
measurements. The results are applied to the identification of boiler dynamics [34], Polycarpou and 
Ioannou present a stability theory approach to synthesis and analysis of identification and control 
schemes in nonlinear systems using neural networks [35], Both gradient and Lyapunov synthesis 
approaches are applied. 
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Applications of neural networks in adaptive control have also been investigated by several 
researchers. Guez, Eilbert, and Kam [36] propose a neural network architecture for neural model 
reference adaptive control. This system adjusts feedback gains so that the closed loop time response 
matches a desired time response of a given reference model. Hoskins, Hwang and Vagners [37] use 
iterative inversion of a neural plant model to provide control signals to the plant. The method is 
applied to a problem in redundant manipulator kinematics, a model reference adaptive control system, 
and a linear mass-spring-damper system. Hoskins and Himmelblau use similar techniques with an 
emphasis on reinforcement learning applied to process control [38], 

Goldenthal and Farrell [39] backpropagate the error between the actual plant and a reference 
model through a neural network model of the plant and then continue the backpropagation procedure 
through the controller network to update controller weights. The technique is demonstrated in a 
model reference neural adaptive control system applied to the cart-pole problem. To accomplish this, 
the backpropagation algorithm is extended so that the network can function as a closed-loop 
controller and to force the closed loop system to match desired reference response. 

Lan and Chand also investigate the discrete time linear quadratic regulator problem [40], 
They point out that the conventional solution of the problem is an off-line solution. The computed 
control history is stored and used later in an open loop control. The disadvantage to this approach 
is that it is not robust and does not work for time-varying systems. Lan and Chand formulate an 
augmented performance index with the linear constraint equations of the controlled system embedded. 
The augmented performance index is then related to parameters in the energy function of a Hopfield 
network [41], The Hopfield network then minimizes the performance index in an iterative fashion 
producing the required optimal control. 
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Iiguni, Sakai and Tokumaru [42] report a nonlinear regulator design which uses feedforward 
neural networks to augment a linear quadratic regulator design for a nonlinear plant with parameter 
uncertainties. The authors assume that the nonlinear plant can be modeled using a known linear state 
space model. This linear model is then used as the basis for a linear quadratic regulator (LQR) 
design. The LQR design procedure yields gains for plant state feedback which minimizes a linear 
quadratic performance index. We now have a regulator design which may be used with the actual 
plant, however, the range of optimal control operation is limited. 

Bouzerdoum and Pattison give a method for mapping a class of optimization problems onto 
a recurrent neural network architecture [43], The method minimizes a static quadratic performance 
index, 

J (x) = -^x T Qx-x T y (1) 

with respect to vectors x e IR n subject to bound constraints 

Pi < x ; <; v ; , i = 1,- ■ • ,n (2) 

where the subscripts indicate components of the respective vectors. This static optimization problem 
has a known solution. However, a matrix inversion is necessary and this is computationally intensive 
for large dimensional spaces and difficult for ill-conditioned weighting matrices. The recurrent neural 
network solution provides a parallel implementation for solving the problem. 

Antony and Acar develop algorithms for real-time optimal control of discrete systems with 
respect to a quadratic performance index over a finite time interval [44], Problem formulations based 
on the discrete time Hamiltonian for linear and partially unknown nonlinear systems are given. The 
method depends on a model of the plant dynamics using a feedforward neural network. Two distinct 
methods are given. For the first method, control vectors at each sample instant are modified during 
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every iteration of the algorithm. The second method develops the optimal control by a backward 
sweep beginning at the final time. The second method has slower convergence rates but requires less 
storage and fewer computations during each iteration. 

In this research, we have formulated two types of neural networks. The first one is called an 
“Adaptive Critic’ architecture. The reason for choosing this structure for formulating the hypersonic 
vehicle optimal control problems are: 1) this structure obtains an optimal controller through solving 
dynamic programming equations, 2) this approach (see, Figure 1), has a supervisor (critic) which 
critiques the outputs of the controller network and a neural network controller. Therefore, this 
approach has a built-in fault tolerance, 3) this approach needs NO external training as in other forms 
of neurocontrollers, 4) this is not an open loop optimal controller but a feedback controller, and 5) 
it preserves the same structure regardless of the problem (linear or nonlinear). 

The adaptive critic method determines an optimal control law for a system by successively 
adapting two networks, an action and a critic network. The control law does not need to be 
determined a priori mathematically. This method simultaneously computes and adapts the neural 
networks to the optimal control policy for both linear and nonlinear systems. In addition, it is 
important to know that the form of control does not need to be known in order to use this method. 
Since the control law is computed for a range of initial conditions, this approach is ideal for design 
studies. 

The second approach is to formulate a neural network for simultaneous identification and 
control. This uses a modified form of Hopfield neural networks. The need for this network arose 
after the customer indicated that there is a large level of uncertainty in the system parameters. We 
anticipated the need for this during the second year and formulated the network while awaiting the 
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P0ST3D program and inputs. Research and development based on this approach are presented as 
a conference paper at the end. This paper was presented at the 1996 Atmospheric Flight Mechanics 
Conference in July 1996 at San Diego, CA. This paper is enclosed in the Appendix. 

The first part of the rest of this report deals with the adaptive critic approach, problem 
formulation, algorithm development and results. 

IY. PROBLEM FORMULATION 
1. Statement of the General Problem 

In this study a problem of the form (finite-time with terminal constraints) where a cost 
function, J, given by 

‘r 

J=4>(x(t f )) + J 'Jt(x(t),u(x))dT (3) 

0 


subject to differential constraints 

x=f(x,u) (4) 

t f = given x o s given (5) 

is considered, x is an n-dimensional state vector, u is an m-dimensional control vector, 4>( ), i|r( ), 
and f( ) are linear or nonlinear functions of state and/or control. x 0 are the initial conditions and 
tf is the final time. 
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2. Dynamic Programming Bac k g round 


We can rewrite Eq. (3) 


J(x(t))=U(x(t),u(x(t))) +<J(x(t + 1))> 


( 6 ) 


Here, J(x(t)) is the cost associated with going from time t to the final time. U(x(t),u(x(t))) is the 
utility, which is the cost from going from time t to time t + 1. < J(x(t+ 1)) > is assumed to be the 
minimum cost associated with going from time t+1 to the final time. If both sides of the 
equation are differentiated and we define 


A(x(t))3 


6J(x(t)) 

6x(t) 


(7) 


then 


Mx(t))= 


5U(x(t),u(t)) ] 6U(x(t),u(t)) 
5x(t) 8u(t) 


+ X(x(t+1)) 


8x(t + l) \ 
6x(t) / 


+ X(x(t+1)) 


6x(t+l) 5u(x(t)) \ 
8u(t) 5x(t) / 


( 8 ) 


From this it can be seen that if < A(x(t+ 1)) > , U(x(t),u(t)) and the system model derivatives are 


known then X(x(t)) can be found. 

Next, the optimality equation is defined as 


5J(x(t)) _ o 

6u(t) 


Dynamic programming uses these equation to aid in solving an infinite horizon policy or to 
determine the control policy for a finite horizon problem. 
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3. Training Methods (Approximation Techniques) 

This study uses Eqns. (8) and (9) in order to determine the optimal control policy. The 
basic training takes place in two stages, the training of the action network (the network modeling 
u(x(t)) and the training of the critic network (the network modeling, or approximating A(x(t)). 
Both networks are assumed to be feedforward multiple layer perceptron networks. 

The schematics of the controller (action) and critic networks are presented in Figures 2 and 
3. To train the action network for time step t, first x(t) is randomized and the action network 
outputs u(t). The system model is then used to find x(t+l) and (6x(t+ l))/(6u(t)). Next, the 
critic from t+1 is used to find A(x(t+1)). This information is used to update the action network. 
This process is continued until a predetermined level of convergence is reached. 

In order to train the critic network for the time step t, x(t) is randomized and the output 
of the critic A(x(t)) is found. The action network from step t calculates u(t) and (5u(t))/(6x(t)). 
The model is then used to find (Sx(t+ l))/(Sx(t», (5x(t+ l))/(6u(t)) and x(t+ 1). The critic from 
step t+1 is then used to find A(x(t+1)). After this, Eq. (8) is used to find A (x(t)), the target 
value for the critic. This process is continued until a predetermined level of convergence is 
reached. In an infinite-dimensional problem, the training ends with one stage; however, for a 
finite dimensional problem, such as this study, this series of steps is used at each stage. This 
process will be explained in detail in the next section. 

V. OPTIMIZATION/CONTROL 

Motivation for the formulation in this section comes from the need of the customer in that 
they would like to study the trajectories from the scramjet turn-off to the rocket burn-out conditions 
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of a certain vehicle. The reason for this is the uncertainties in the parameters of the earlier stage 
designs. Consequently, there will be an envelope of conditions from which the rocket will have to 
start and yet carry the payload to the pre-specified burn-out conditions. It is assumed that the rocket 
burn-out conditions will ensure a proper apogee through the coasting period. 

The cost function is given by 


where 



( 10 ) 


J 5 cost function to be minimized 

m s mass 

v = velocity 

y s flightpath angle 

h s altitude 

Sj s weights on the final conditions 

Subscripts 


f h final 

f D 3 desired final 

Note that this cost function maximizes the final payload while ensuring that the velocity, the flightpath 
angle and the altitude at the final time are as close to the final/desired burn-out conditions as possible. 
The equations of motion are given by 
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m 


si 


sp 


( 11 ) 


h = v sin y 


( 12 ) 


v = (T cos a - D) / m - \i sin y / r ‘ 


(13) 


where 


T 

L 

D 

1 » 


Y = (T sin a +L)/mv 


- p / r 2 v cosy 


thrust 


1 


k., a — p v 2 S = lift 
2 

( C d 0 + kj a 2 ) | p v 2 S s drag 

gravitational constant 

radial distance from the center of the earth 


(14) 


R, s radius of the earth 

I s s specific impulse 

a * angle of attack 

A schematic of the scenario is presented in Figure 4. Final time is unknown. That means, this is a 
‘free-final time’ problem. There is no solution in the current literature for solving the ‘free-final time’ 
problem for an envelope of initial conditions (other than the general method of dynamic 
programming). 

In order to solve this problem with neural networks, we transform it to one where altitude is 
the independent variable. Through this step, we convert it to a problem where we can break it down 
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into several segments of altitude; this also allows us to reach the final desired altitude in all cases. 
The initial conditions for this scenario are the possible final conditions from the termination of 
scramjet. 

This is a two-point boundary value problem where the initial conditions are known but the 
final conditions are unknown. Usually, it is solved for a given set of initial conditions; however, in 
this project we develop an adaptive critic-based solution which will solve the problem for an envelope 
of initial conditions. By reformulating the model, we are able to remove altitude from the cost 
function since the final condition in altitude is satisfied exactly. 

The reformulated equations of motion with altitude, h as the independent variable, are given 


by 


dm/dh 


T / gl sp • 1 / v sin y 


(15) 


dv/dh 


T cos a - |C D +K 2 a 2 J^pv 2 S 
mvsiny 


g/v 


(16) 


where the drag coefficient has been approximated with a parabolic drag polar with a least squares fit. 


dy/dh 


1 7 

T sin a + k.a-pv^S 
1 2 

+ 

v _ g 

cosy 

mv 2 siny 


r v 

v sin y 


(17) 


In Eqns. (15-17), where lift coefficient C L has been approximated with a linear least squares fit. 
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C D ,K 2 ,K 1 = constants 


local acceleration due to gravity 


In order to calculate the flight time, a fourth equation is added as. 


dt/dh 


1 


v sin y 


(18) 


For solutions with neural networks, we convert these nonlinear differential equations to 
single-step discrete equations as: 


m k,i = m k" 


( T k 1 


Ah 


Sjsp v k sin Y k ) 
v k.i =v k + [( T k cosa k " D k) /m k v k sin Yk - Sk /v k] Ah 


Yk.i = Y k + 


( T k sin a k +L k) /m k v k sin Y k 


v * *1 ) 

cosy k 

^ 1 

< 1 
rr | 

V k sin y k 


k+i = l k + 


( , \ 


V v k sin Yk } 


Ah 


(19) 


( 20 ) 


Ah (21) 


( 22 ) 


where 

Ah = step size in altitude 

k = stage 
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The corresponding Hamiltonian of the optimized problem is 


where 


X 


H, = A 


m 


K * 1 


A, 


k » 1 


Yk.i 


YkH 


Lagrangian multiplier for variable x at stage (k+1). 


(23) 


The propagation equations for the Lagrange’s multipliers are obtained by partial 
differentiation of the Hamiltonian with respect to the states. They are: 


fH, 

3: x,. 


, x a [m.v.y] 1 


(24) 


X m = X 

nv nil. 


Ah 

2 

IHl 


T k C0S “k - D k , x 

V k S ' n Yk 


T k sitlCC k +L k , 

2 . A Y k . 

v k siny k 


(25) 


X = X.. 


Ah 


v k siny k 


k -‘ Sk i sp 


T k cosa k Sk sin Y k ° k 


+A 


X — X + 

Yk Yk*i 


’Vi 


m k v k 


2T k sina k 

'Yit.i 


m k V k 

Ah 

’ T k cot Yi 

k sin Y k 

Sk^sp 


m k v k 


-(T k c°sa k ~ D k )coty k 


m. 


Yk.i 


( T k sina k +D k) cot Y k _ 


m k v k 


X _ 8k' 

1 


v r k v k > 

siny k 



(26) 


( 27 ) 
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Note that A k+1 is needed to solve for A k . The boundary conditions for the multiplier equations are 


dJ 

dX 


/ x - x. 


(28) 


Optimal control is obtained by partially differentiating the Hamiltonian with respect to the control. 
In our case, angle of attack, a, is the control variable. We get 


aH, 

da v 


= 0 


(29) 


This gives 


A.. 


k*l 


~( T k sina k + k 2 p k v k 2 sa k ) ] 


1 2 

Tfc cosa k + k, — p k v k s 


Yk.i 


= 0 


(30) 


First, we solve for the control at the (N-l)‘ h stage where N is the preselected number of stages. 
That is, (after using small angle (a) ) assumption 


A 


Yn 


. 1 + Pn - 1 V N - 1 S ) a N - 1 


Vt +k l 2 p N-! V N-l S 


/V N-1 = 0 


(31) 
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Note that 


K = s i - v r D ) (32) 

(33) 


By substituting for A v ^ and A, in Eq. (31), we get 


^2 v n v ro 


^3 [Yn 


T N -i + K Pn-i v n-i s ) “n-i 


T n-i + k i ^ Pn-i V N-1 s 


1 V N-1 = 0 


(34) 


We substitute for v N and y N in Eq. (34) in terms of v N _, and y N .! by using propagation equations, Eq. 
(25-27). 


V N-1 + 


l N-1 '-'D 


C. + k. 




q N - 


- v. 


f D 


m N-l V N-l Sin Y N -l 
“ (t n . i + 2 Icj q N . t J <* N - [] 


N-l 


Ah 


^3 [ Yn-1 



(^N-l + k l ^N-l^ ) tt N-l ^ 

V N - 1 

q N-l] 

coty N _, 

1 + 

m N-i v n-i s i n YN-i 

v r N-l 

V N-J 

V N - 1 


Yro 


V, + k i ‘-In - 1 ^ 


V 


N- 1 


= 0 


(35) 
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where the dynamic pressure q N . t is 


%-i 



(36) 


This leads to a cubic equation in a N . L as 


T T 

L 2 1 3 


*N-1 


(T,T 3+ T 5 T 6 )a N . 


t 4 t 5 -0 


(37) 


where 


T t 


V N-1 + 


^N-l " ^Dd^N-I ^ _ Qn-1 

m N - 1 V N - 1 S ' n tt N - 1 V N-1 j 


Ah - v. 


fD 


^2 ^2 ^N-l ^ 


Ah 


m N-l V N-l S,n Y N -l 

(Tn - i + 2 ^2 q N - 1 s ) 


(38) 


(39) 


(40) 


t 4 


Yn-i + 

2; 

c r 

i 

> 

v 

COtY N-l AU „ 

Ah - Y ro 


^ r N-l V N - I ) 

V N-1 


t n-i + k i q N -i s 


(41) 


(42) 


N - 1 


s 3 T 5 /(m N .,sinY N . 1 v N . l ) 


(43) 
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We can observe that all quantities are known in terms of quantities at N-l. That is a N _, is available 
as a feedback control based on states at N-l. 

For all other stages, k, we obtain the expression for control in terms of the Lagrangian 


multipliers at k + 1 . 

a _ V, . i T k + k iq k s 
aK " v x T k + 2 M k s 


(44) 


How do we construct the neural networks to solve this problem? 

1. Solve for a N _i in terms of m N . t , v N .„ y N ., 

Generate various a N .[ by changing m^ , v^, . 

Use a neural network to output <x N .j for m N . t , v N _ L , y N . t .... called oc NM network. 

[ We have optimal now] 

2. In order to solve for a,, (k=0, l,2...N-2), of m k , v k , y k , we need , X v ^ , A y ^ . So, use 

A N , m N .„ v N .„ y N ., and a N ., from step 1 to solve forA mf( ^ , A,,^ , and A^ using the A- 
(backward) propagation equations, Eq. (25-27). Train a neural network with %.], v N . b y N ., 
as inputs and A n .i as output. Call this A N . t network. 

[We have optimal A N ., now.] 

How do we construct other networks? 

3. Assume different values of m N . 2 , v N . 2 , y N . 2 and use a neural network to output a N . 2 . This will 
not be optimal. Use m N . 2 , v N . 2 , y N . 2 , %. 2 in state equations, Eq. (19-21), to obtain m, v, y at 
N-l. Use these states in A N ., network to output A N . t . Use these A^ in optimal a k equation, 
Eq. (44), to compute (a N . t ) urgel . Continue this process till convergence. 

[We have optimal a N . 2 now.] 
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4 . 


Assume different values of m N . 2 , v N . 2 , y N . 2 and use them to get a N ., from aN-2 network. Use 
all these in the state propagation equations to calculate states at N-l. Input these states in 
A n | network to get A N . 3 . Use this A n .j and states and control at N-2 to find A N _ 2 from the A 
propagation equations. Construct a X s _ 2 network to output A N _ 2 with m N _ 2 , v N . 2 , Yn -2 inputs. 
[We have optimal y N . 2 now.] 

5. Assume different values of m N . 3) v N . 3 , y N . 3 and construct an a N . 3 network similar to a N . 2 

network in step 3 . 

6. Construct a A N . 3 network similar to A N . 2 network in step 4. 

Continue this process from k = N-l, N-2, ....0 

How do we use these networks to generate optimal trajectory from given initial conditions? 
Assume any m,,, v 0 ,y 0 and [within the trained range]. Use o^, neural network to find optimal a and 
integrate till h for a! network is reached. Use the m,, v 3 , y, values to find oq from the a, neural 
network and integrate till h 2 is reached, and so on, till h f is reached. 

Note that the forward integration can be done in terms of time and note that the Lagrange multiplier 
network, used in the controller synthesis, is not needed now. 

VI. NUMERICAL RESULTS 

In order to verify the applicability of the adaptive critic approach to flexible trajectory 
optimization, we used the rocket vehicle contained in a test case sent by the customer. We present 
the results corresponding to two stages of neural-controlled trajectories from the burn-out of the 
rocket in Figures 5-15. The desired end conditions are v f = 7617 ft/sec, y f = 16.636 deg., h f = 
243,600 ft. In trying to match the final conditions, the values of S,, S 2 , and S 3 are chosen to be 1, 
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1, and 10 6 This means that we desire to try and match the final flightpath angle more closely related 
to maximization of final weight and matching the desired final velocity. Effect of changes to initial 
flightpath angle are presented in Figures 5-7. We have fixed the initial velocity and mass and changed 
the initial flightpath angles. It can be observed from Figure 5 that after following different paths of 
velocity in Stage 1 for the first 12.2 seconds, all the 10 paths try to converge; the same trend can be 
seen in Figure 6 which shows the flightpath angle histories. Due to the relative emphasis on the 
flightpath angle, we can observe that the flightpath angles are more convergent to the desired final 
value than the velocities. The weight history is almost invariant since the thrust is almost constant. 
Figures 8-11 represent the mass, flightpath angle, velocity, and altitude histories with time where we 
change the initial mass in steps. The effectiveness of this formulation is clear from the flightpath angle 
history presented in Figure 9. Even though the initial step (due to changes in mass) leads to different 
flightpath angles, the control from the last stage brings them very close. Although velocities appear 
divergent, it should be observed that they are scattered close to the desired final value. The altitude 
history is very close to the same in all the cases as expected and satisfies the final condition. Figures 
12-15 represent the state variable histories due to changes in the velocities. Due to the divergence 
of the flightpath angle value at the end of the first stage, the second stage velocities show apparent 
deviations from the desired value so that the resulting second stage flightpath angles can be closer 
to the desired value. The slight variations in the final altitude are due to the forward integration in 

time which we limited to 20.4 seconds. 
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VII. CONCLUSIONS 


An approach to solving ‘free final time’ problems with an envelope of initial conditions has 
been proposed. This approach called ‘the adaptive critic’ consists of two neural networks at stage 
developed in a backward sweep. After development, only the controller is used in forward integration 
of trajectories. Numerical results from the last stage of a launch vehicle trajectory (provided by the 
customer) show that this approach works well and can be used in design. Further work will involve 
integration with POST3D, consideration of the other phases of flight etc. 
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Network Training Diagram 
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Figure 4: Schematic of the Trajectory 
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Abstract 

This paper presents a class of modified Hopfield neural networks and their use in 
solving aircraft optimal control and identification problems. This class of networks 
consists of parallel recurrent networks which have variable dimensions that can be 
changed to fit the problems under consideration. It has a structure to implement an 
inverse transformation that is essential for embedding optimal control gain sequences. 
Equilibrium solutions are discussed. Energy minimization of the networks leads to 
identification of the system parameters. Numerical results are provided to identify 
the dynamics of sen aircraft, and the corresponding optimal control is calculated on- 
line. Comparison of the neural network solutions with point-wise optimal control 
using LQR formulation for this multivariable control problem shows near identical 
results throughout the trajectories. 


1 Introduction 

There has been a spurt of activities in the area of 
artificial neural networks (ANN) during the last ten 
years. For a survey of the ANN work done in the 
areas of identification and control, see bibliography. 
There are two types of networks used in almost all 
ANN applications. The first is the more widespread 
feedforward network and the second is a less un- 
derstood recurrent network. The feedforward net- 
works where data flow is unidirectional are essen- 
tially static; the recurrent networks, on the other 
hand, are based on feedback connections. Due to 
feedback connections, the recurrent networks are 
better suited for control problems which are based 
on closed-loop solutions. 

In this paper, a variation of the Hopfield net- 
work is proposed. Compared to the classic Hopfield 
network, it keeps the characteristic of energy min- 


imization, which is used to minimize the identifi- 
cation errors. The mean-square error is used as a 
performance criterion in system identification, and 
is formulated in an energy form to utilize the net- 
work functionality. Based on the equilibrium analy- 
sis, these networks can perform an inverse transfor- 
mation on matrices and other auxiliary mathemat- 
ical operations. This feature allows the networks 
to give out optimal control gain sequences based on 
the identified system parameters. In addtion, this 
class of networks has more degrees of freedom than 
the classic Hopfield networks. The network architec- 
ture can be augmented according to the problems at 
hand. 

The modified Hopfield network is analyzed in sec- 
tion 2. Its identification application is presented in 
section 3, while the control application is in section 
4. Both the principles and examples are given in 


•Associate Fellow, AIAA (to whom all correspondence should be sent) 


1 

American Institute of Aeronautics and Astronautics 



each individual section. Conclusions axe presented 
in section 5. 


2 Modified Hopfield Networks 


2.1 Stability 

The modified Hopfield network is a variant of the 
classical Hopfield network. Fig (1) shows its basic 
features. 

We will demonstrate its stabiiicv by analyzing its 
dynamics and using energy function. The network 
has two clusters of neurons. The right part of the 
networks is characterized by outputs which are 
nonlinear functions / of their state Uj 

= /(Uj) (1) 


where 


n 

Uj = y t ujjjVj - bj , ; = l,2 ,...,m ( 2 ) 


with bj the exogenous input current, and the out- 
put of the left cluster of amplifiers. Conductance vj{j 
connects the output of the j 1 th neuron to the input 
of the i ? th neuron, which axe indicated in Fig (1) as 

H. 

The left part of the networks is characterized by 
the dvnamics. The amplifiers have input conduc- 
tances and capacitances denoted as gi and Ci, respec- 
tively. They both represent the amplifiers' parasitic 
input impedance and are responsible for the appro- 
priate time-domain behavior of the entire network. 
At the same time, w'e assume that the response time 
of $(u;) is negligibly small compared to that of the 
amplifiers <7(1^-). 

Under these assumptions, Kirchhotf's law gives 
us 


a£ J=l W/ 

(z = 1,2,... ,n) 


where Gi denotes the sum of ail conductances con- 
nected to the input of the uh neuron and is equal 
to 

Gi=9i-Y< w *i (4) 

J=i 


and a, is the exogenous input current. 


Using Equation (1), the above formula can be 
expressed as follows 



GiUi - X Wjif(^2 WkiVk ~~ & j ) 

j — L Jfe= l 


(i = 1,2 ,... t n) 


(5) 


We now define the following Liapunov function 
as an energy function £for the modified Hopfield 
networks 


E(v) = X akVk + X F ( X! Wfc > Ujfc ” ^ 


k=l 




k~i 



g 1 (v)dv 


(6) 


Define 



(7) 


The components of the gradient vector of the as- 
sumed energy function (6) can be expressed by find- 
ing its derivatives as follows 


dE(v) 


m ri 

GiUi -r y yjjifCZi UJk l Vk ~ 

; = 1 **1 ( 8 ) 


The time derivative of the energy function can 
now be expressed using the above equations 


dE_ 

dt 


t=i 




;= i 




*= 1 


E dv, dui_ 
L ' dt ’ dt 




avi 

dt 


(9) 


Since C, > 0, and g~ l (vi) is a monotonically in- 
creasing function n, the sum on the right sight of (9) 
is nonnegative, and therefore w e have dE/dt < 0, 
unless dvi/dt = 0, in which case dE/at = 0. This 
means that the evolution of dynamic system (5) in 
state space always seeks the minim a of the energy 
surface E. Integration of Eqs. (5) and (6) shows 
that the outputs Vj do follow gradient descent paths 
on the E surface. 
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Figure 1: Modified Hopfieid Networks 


2.2 Solution 


In order to get the analytic expression for the con- 
verged value of the networks, we assume small sig- 
nals and that they work in the linear region of the 
amplifier. Note that in the above derivation, there is 
no difference if we denoce the connection matrices in 
the left and right adjoint subnets separately. These 
connection matrices are nothing but the weights uiij. 
Let the right connection matrix be Di, and the left 
connection matrix be D 2) the stability conclusion 
still holds. Under these mild assumptions, and with 
Kirchhotfs law, we can have a relation in a matrix 
form as 


C— = -a - GU - DT Q (10) 

dt 

Q = j^(D 2 V-b) 

= K 2 {Ki DoU-b) (11) 


where a and b axe the exogenous inputs of the ad- 
joint networks G and F respectively. U is the input 
to G and V is the output of G. We also assume 
that ail amplifier gains K\ in G are equal, oimilarly 
the gains of amplifiers in F are Kn- Ki ^ud K 2 are 
scalars. Substitute Equation (11) into Equation (10) 


to get 



-a-GU- 

Df^(K\D 2 U -b) (12) 

-(G-h^2^iDfD 2 )U 

±Xi Djfb - a (13) 


When the networks reach equilibrium, 
dU/dt = 0, and 


V 


K 2 XJ 

(of Dc 





(14) 


2.3 Discussion 

Equation (14) gives the general solution for the mod- 
ified Hopfieid networks. Compared with the classi- 
cal Hopfieid networks, an obvious feature is that it 
involves more parameters. We may find some ap- 
plications in which these parameters can be taken 
advantage of. Also some of them can be avoided 
depending upon the desired objective. 

Note we get two factors involved in the Inverse 
operation. As a result, the structure of this kind ot 
recurrent networks is quite flexible, while the clas- 
sical Hopfieid is seif-recurrent, that is, it feeds bade 
its own output; the variation is mutually recurrent, 
that is, it feeds back the outputs of its two-adjoint 
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parts. This architecture can be expanded further 
with ease to three or four subnets or several layers 
as needed. Some special applications may need that 
computational relationship, but it is not needed for 
the application considered here. 

The dimensions of parameters a, b, and D? 
depend on the applications. K\ and K 2 also can be 
designed to provide appropriate magnitudes. If K L 
is large, then G and a will boch have less effect on 
the output V or ignorable. If we want a have rea- 
sonable influence in the expression while G should 
not, then we design K 2 large, and determine K\ ac- 
cording to the requirements on a. 


3 System Identification 

3.1 Problem Formulation 

The proposed structure for system identification in 
the time domain is shown in Fig (2). The dynam- 
ics of a linear plant (to be identified) are defined by 
the usual equations, where A P and B P are unknown 
matrices and x and u are the state and control re- 
spectively. 

x = A p x -r B p u (15) 

The dynamic equation of the system model de- 
pends on e, which is the error vector between actual 
system states x and estimated values y. 

y = A s (e, t)x -f- B s (e, £)u - Ke (16) 

Therefore, the error dynamics equation is a runc- 
tion of state and control. 

e = (A p - A s )x 4- (B p - B s )u -r Ke 

(IT) . 

The goal is to minimize simultaneously square- 
error rates of all states utilizing a Hopfieid network. 
To ensure global convergence of the parameters, the 
energy function of the network must be quadratic 
in terms of the parameter errors, (A p - A,) and 
(B p — B s ). However, the error rates e in Eq. (17) 
are functions of the parameter errors and the state 
errors. The state error depends on y, which, in turn, 
is influenced by A, and B,. Hence, an energy func- 
tion based on e will have a recurrent relation with 
A, and B 3 . To avoid this, we use the following en- 
ergy function, where tr defines the trace of a matrix. 


and (*) r is the transpose of matrix, (see, Raol, Bib- 
liography) 

1 f T 1 r 

E = - -e q (t) r e q (t)ci£ 

1 fT 1 

= T Jo 9^* - A,x " B ’ u ) 

*(x — A 9 x — B s u)dt (18) 

In order to facilitate the derivation, we expand 
the items in the factors of the energy function, and 
utilize the trace identities to simplify. 




Equation (19) is quadratic in terms of A, and 
B s . Substituting A p x fB p u for x in Eq. (19) in- 
dicates that E is also a quadratic function of the 
parameter errors. Based on Eq. (19), we can pro- 
gram a Hopfieid network chat has neurons with their 
states representing different elements of the A s and 
B, matrices. From the convergence properties of the 
Hopfieid network, the equilibrium state is achieved 
when the partial derivatives dE/d A s and dE/d B, 
are zero. We use the following identitiesto find the 
partial derivatives of E . 

A t r{ABA T ) = 2 AB (20) 

dA 

— tr(ABD) = B t D t (21) 

a A 

This results in the following, where A," and B m s 
axe optimum solutions of the estimation problem. 
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Figure 3: Schematic of Longitudinal Flight 


Define, 



xx T 0 0 

0 xx T 0 
0 0 xx T 

0 0 0 

ux r 0 0 

0 ux r 0 
0 0 ux T 

0 0 0 


0 xu 0 0 

0 0 xu 0 

0 0 0 XU 

xx T 0 0 0 

0 u 2 0 0 

0 0 u 2 0 

0 0 0 u 2 

ux l 0 0 0 


0 

0 

0 


xu 

0 

0 

0 

o 

U" 


dt 



T T 

Z 3 X X4* 



With these as weights and biases of the networks, 
cLij, and bj can be solved through Eqs. (27) and (28). 
Derivation of [wjj] and [aj] assumes that the neuron 
input conductance, Gt t is low enough so that the the 
second term in Eq. (3) can be neglected. 


3.2 Numerical Example 

We present a representative numerical example to 
validate the capacities of the modified ELopfield net- 
works. The orientation of an aircraft involving longi- 
tudinal dynamics is shown in Fig (3). The linearized 
equations of motion of an aircraft in a vertical plane 
axe given by 

x = Ax -r Bu (30) 


The matrix A represents the dynamic stability 
derivatives and Is given by 


A P 


—0.0148 -13.38 -32.2 0 

-0.00019 -0.34 0 1 

0 001 

0.00005 -4.3 0 -0.5 


The matrix B represents the control derivatives and 
is given by 

r -i-i 1 



-3.74 


The control variable u represents eievator deflection. 

pig (4) shows the simulation results of the system 
identification. These figures represent oniy A P u, 
A p i2, ^ f? P 4 histones; similar results can be 
obtained for other elements of the A p and Bp matri- 
ces. From the numerical results shown in Fig (4), It 
is clear that the network is able to identify system 
parameters very well. 


4 Optimal Control Application 

4.1 Problem Formulation 

Let the plant to be controlled be described by the 
linear equation 

z k +i = AkXit - B k uk (32) 

with Zk £ R n and u^ € H. m - The associated perfor- 
mance index is the quadratic function 


where, the elements of the state space x are 

x = [u 1 a 9 q] T (31) 


iV-L 


Ji = ^Tf s v x * + i E ( x *Q kXk '■ UkRkUk }.„> 
2 - (3->) 
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7 igure 4: Identification History 


defined over the time interval of interest Note 

that both the plant and the cost-weighting matrices 
can be time- varying. 'The initial plant state is given 
as x t *. We assume that Qk, Rk and Sw are symmetric 
positive semidefinite matrices, and in addition that 
\R k \ 7 = 0 for ail k. 

The objective is to find the control sequence 
to minimize J*. 

To solve this linear quadratic regulator (LQR) 
problem, we begin with the Hamiltonian function 

H k = ~ (x^QjfeSfc -f-ujHfc-Ufc) -r Ajli (AfcXfc -r-BfcUfc) 

2 v (34) 

Then we can get the state and costate equations 

dH k 


— *4-1 — 




= AfeZfc -r Stun, (35) 


A fc = ^. = Q k x k +A£\k+i 
OXk 


and the stationarity condition 
dHk 


n OHk D , nT\ 

0 = -r - -r B k Aj fcj-l 


3u 


(36) 


(37) 


This procedure will finally lead to the control, 
u, = -K k x kl k<N 
where the Kalman gain Kk is given by 

Kic = (SjSt+iBfc - H«c) 1 S* 5fc+iAfc ^ 

In terms of the Ricacti variable St, 

5* = AjS M (A* - BkKk) + Q* (39) 
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Figure 5: Simulation plot 


In the application where the control interval is 
finite, S,v will be given. Alternatively use Equa- 
tion (38) and (39), we will get a senes of K k - The 
gain matrix Kk will generally be time-varying even 
when the matrices A*, Bk, Qk &k are ^ con- 
stant. But if the control interval is infinite, the above 
formulation need to be changed a little. 

4.2 Network Solution/Implementation 

We briefly discuss the recurrent network solution for 
optimal gain sequence. 

Based on the recursions in Equations (38) and 
(39), the most commonly encountered operations are 
scalar and outer product vector multiplications and 
matrix-vector multiplication. But the crucial oper- 
ation here is the inverse to get the Kalman gain. 

The modified Hopfleld networks contain both in- 
variant and variable parameters. Invariant parame- 
ters are fixed in the neuron-computing model, while 
variable parameters can be modified. By comparing 
Eqs. (38) and (39) with the stable output of the net- 
work Eq. (14), if we set Di T = Sk+i% D? = 

= R kt and b = A kl a = 0, the network will 
give us the Kalman sequence. As we know, it is not 
difficult for the circuits to achieve the multiplication 
of two signals. However, since Di and are con- 


nection conductances, can they be changed by other 
signals like Sk^i and B.t? 

The answer is a voltage-controlled switch. A 
voltage-controlled switch can be implemented using 
a single field-effect or MOS transistor operating in 
the resistive (ohmic, also called linear) region. So, all 
the signals are preferred to be voltage signals. Tne 
system parameters A k and 3 1 axe generally the out- 
puts of Identification modules which are convenient 
to be given out as voltages. The optimal control 
formulation does not limit the Ak , Bfc, Qk and Rk 
matrices to be constants and the modified Hopfield 
Hopfield network doesn f t Limit its capacities either. 
Time- varying A fcl B k etc. are easy to be feed into 
the net as voltage signals to be used in the compu- 
tations. 


4.3 Numerical Example 

We consider the synthesis of an optimal longitudi- 
nal autopilot in this section. The performance index 
in this application is an infinite-time quadratic cost 
function. The niminizing concroL is expected to drive 
the deviations of the longitudinal dynamics in pitch 
angle 9 , pitch race q, forward velocity u\ and andle 
of actack a to zero. 
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Figure 6: Control History 


The system parameters are- the same as identifi- 
cation. The performance index has the form 





(40) 


where Q, and R are appropriate weighting matrices. 
We select R = 91.32 and 


Q = 


10.37 

0 

0 

0 


0 0 

0.0004 0.0016 

0.0016 7.25 

0 0 


0 

0 

0 

14.34 


The simulation plot is shown is Fig (5). The con- 
trols which are calculated by networks, compared 
with LQR results are shown in Fig (6). The states 
trajectories are shown in Fig (7). The controls are 
applied at 2 seconds. 


5 Conclusion 

A class of modified Hopfieid networks has been pre- 
sented to solve parameter identification and optimal 
control problems. The architectures are designed 
to suit an energy minimization for system identifi- 
cation and a typical optimal control algorithm for 
system control. Similar to the Hopfieid network, 
the stability of these modified networks is guaran- 
teed. But they provide more degrees of freedom and 
flexibility to accommodate different applications. A 
four-dimensional aircraft control problem is iden- 
tified and optimal control is obtained as illustra- 
tions of these approaches. Future work on this topic 


will investigate the robustness of such network con- 
trollers and the use of these methods for ocher rele- 
vant applications. 
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